225 research outputs found

    Self-Learning Classifier for Internet traffic

    Get PDF
    Network visibility is a critical part of traffic engineering, network management, and security. Recently, unsupervised algorithms have been envisioned as a viable alternative to automatically identify classes of traffic. However, the accuracy achieved so far does not allow to use them for traffic classification in practical scenario. In this paper, we propose SeLeCT, a Self-Learning Classifier for Internet traffic. It uses unsupervised algorithms along with an adaptive learning approach to automatically let classes of traffic emerge, being identified and (easily) labeled. SeLeCT automatically groups flows into pure (or homogeneous) clusters using alternating simple clustering and filtering phases to remove outliers. SeLeCT uses an adaptive learning approach to boost its ability to spot new protocols and applications. Finally, SeLeCT also simplifies label assignment (which is still based on some manual intervention) so that proper class labels can be easily discovered. We evaluate the performance of SeLeCT using traffic traces collected in different years from various ISPs located in 3 different continents. Our experiments show that SeLeCT achieves overall accuracy close to 98%. Unlike state-of-art classifiers, the biggest advantage of SeLeCT is its ability to help discovering new protocols and applications in an almost automated fashio

    RECLAIM: Reverse Engineering Classification Metrics

    Get PDF
    Being able to compare machine learning models in terms of performance is a fundamental part of improving the state of the art in a field. However, there is a risk of getting locked into only using a few -- possibly not ideal -- performance metrics, only for comparability with earlier works. In this work, we explore the possibility of reconstructing new classification metrics starting from what little information may be available in existing works. We propose three approaches to reconstruct confusion matrices and, as a consequence, other classification metrics. We empirically verify the quality of the reconstructions, drawing conclusions on the usefulness that various classification metrics have for the reconstruction task

    NetCluster: a Clustering-Based Framework for Internet Tomography

    Get PDF
    Abstract — In this paper, Internet data collected via passive measurement are analyzed to obtain localization information on nodes by clustering (i.e., grouping together) nodes that exhibit similar network path properties. Since traditional clustering algorithms fail to correctly identify clusters of homogeneous nodes, we propose a novel framework, named “NetCluster”, suited to analyze Internet measurement datasets. We show that the proposed framework correctly analyzes synthetically generated traces. Finally, we apply it to real traces collected at the access link of our campus LAN and discuss the network characteristics as seen at the vantage point. I. INTRODUCTION AND MOTIVATIONS The Internet is a complex distributed system which continues to grow and evolve. The unregulated and heterogeneous structure of the current Internet makes it challenging to obtai

    Highlighter: automatic highlighting of electronic learning documents

    Get PDF
    Electronic textual documents are among the most popular teaching content accessible through e-learning platforms. Teachers or learners with different levels of knowledge can access the platform and highlight portions of textual content which are deemed as particularly relevant. The highlighted documents can be shared with the learning community in support of oral lessons or individual learning. However, highlights are often incomplete or unsuitable for learners with different levels of knowledge. This paper addresses the problem of predicting new highlights of partly highlighted electronic learning documents. With the goal of enriching teaching content with additional features, text classification techniques are exploited to automatically analyze portions of documents enriched with manual highlights made by users with different levels of knowledge and to generate ad hoc prediction models. Then, the generated models are applied to the remaining content to suggest highlights. To improve the quality of the learning experience, learners may explore highlights generated by models tailored to different levels of knowledge. We tested the prediction system on real and benchmark documents highlighted by domain experts and we compared the performance of various classifiers in generating highlights. The achieved results demonstrated the high accuracy of the predictions and the applicability of the proposed approach to real teaching documents

    Leveraging the explainability of associative classifiers to support quantitative stock trading

    Get PDF
    Forecasting the stock market is particularly challenging due to the presence of a variety of inter-related economic and political factors. In recent years, the application of Machine Learning algorithms in quantitative stock trading systems has become established, as it enables a data-driven approach to investing in the financial markets. However, most professional traders still look for an explanation of automatically generated signals to verify their adherence to technical and fundamental rules. This paper presents an explainable approach to stock trading. It investigates the use of classification rules, which represent reliable associations between a set of discrete indicator values and the target class, to address next-day stock price prediction. Adopting associative classifiers in short-term stock trading not only provides reliable signals but also allows domain experts to understand the rationale behind signal generation. The backtesting of a state-of-the-art associative classifier, relying on a lazy pruning strategy, has shown promising performance in terms of equity appreciation and robustness of the trading system to market drawdowns

    Scaling associative classification for very large datasets

    Get PDF
    Supervised learning algorithms are nowadays successfully scaling up to datasets that are very large in volume, leveraging the potential of in-memory cluster-computing Big Data frameworks. Still, massive datasets with a number of large-domain categorical features are a difficult challenge for any classifier. Most off-the-shelf solutions cannot cope with this problem. In this work we introduce DAC, a Distributed Associative Classifier. DAC exploits ensemble learning to distribute the training of an associative classifier among parallel workers and improve the final quality of the model. Furthermore, it adopts several novel techniques to reach high scalability without sacrificing quality, among which a preventive pruning of classification rules in the extraction phase based on Gini impurity. We ran experiments on Apache Spark, on a real large-scale dataset with more than 4 billion records and 800 million distinct categories. The results showed that DAC improves on a state-of-the-art solution in both prediction quality and execution time. Since the generated model is human-readable, it can not only classify new records, but also allow understanding both the logic behind the prediction and the properties of the model, becoming a useful aid for decision makers

    Time-of-Flight Cameras in Space: Pose Estimation with Deep Learning Methodologies

    Get PDF
    Recently introduced 3D Time-of-Flight (ToF) cameras have shown a huge potential for mobile robotic applications, proposing a smart and fast technology that outputs 3D point clouds, lacking however in measurement precision and robustness. With the development of this low-cost sensing hardware, 3D perception gathers more and more importance in robotics as well as in many other fields, and object registration continues to gain momentum. Registration is a transformation estimation problem between a source and a target point clouds, seeking to find the transformation that best aligns them. This work aims at building a full pipeline, from data acquisition to transformation identification, to robustly detect known objects observed by a ToF camera within a short range, estimating their 6 degrees of freedom position. We focus this work to demonstrating the capability of detecting a part of a satellite floating in space, to support in-orbit servicing missions (e.g. for space debris removal). Experiments reveal that deep learning techniques can obtain higher accuracy and robustness w.r.t. classical methods, handling significant amount of noise while still keeping real-time performance and low complexity of the models themselves

    Enhancing Interpretability of Black Box Models by means of Local Rules

    Get PDF
    We propose a novel rule-based method that explains the prediction of any classifier on a specific instance by analyzing the joint effect of feature subsets on the classifier prediction. The relevant subsets are identified by learning a local rule-based model in the neighborhood of the prediction to explain. While local rules give a qualitative insight of the local behavior, their relevance is quantified by using the concept of prediction differenc
    corecore